308 research outputs found

    Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites.</p> <p>Results</p> <p>In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from <it>Plasmodium falciparum</it>, <it>P. vivax</it>, <it>P. yoelii</it>, <it>P. berghei</it>, <it>Cryptosporidium parvum</it>, and <it>Toxoplasma gondii</it>. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in <it>T. gondii</it>, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the <it>T. gondii </it>gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes.</p> <p>Conclusion</p> <p>Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of apicomplexa parasites.</p

    Comparasite: a database for comparative study of transcriptomes of parasites defined by full-length cDNAs

    Get PDF
    Comparasite is a database for comparative studies of transcriptomes of parasites. In this database, each data is defined by the full-length cDNAs from various apicomplexan parasites. It integrates seven individual databases, Full-Parasites, consisting of numerous full-length cDNA clones that we have produced and sequenced: 12 484 cDNA sequences from Plasmodium falciparum, 11 262 from Plasmodium yoelii, 9633 from Plasmodium vivax, 1518 from Plasmodium berghei, 7400 from Toxoplasma gondii, 5921 from Cryptosporidium parvum and 10 966 from the tapeworm Echinococcus multilocularis. Putatively counterpart gene groups are clustered and comparative analysis of any combination of six apicomplexa species is implemented, such as interspecies comparisons regarding protein motifs (InterPro), predicted subcellular localization signals (PSORT), transmembrane regions (SOSUI) or upstream promoter elements. By specifying keywords and other search conditions, Comparasite retrieves putative counterpart gene groups containing a given feature in common or in a species-specific manner. By enabling multi-faceted comparative analyses of genes of apicomplexa protozoa, monophyletic organisms that have evolved to diversify to parasitize various hosts by adopting complex life cycles, Comparasite should help elucidate the mechanism behind parasitism. Our full-length cDNA databases and Comparasite are accessible from

    DBTSS: DataBase of Human Transcription Start Sites, progress report 2006

    Get PDF
    DBTSS was first constructed in 2002 based on precise, experimentally determined 5′ end clones. Several major updates and additions have been made since the last report. First, the number of human clones has drastically increased, going from 190 964 to 1 359 000. Second, information about potential alternative promoters is presented because the number of 5′ end clones is now sufficient to determine several promoters for one gene. Namely, we defined putative promoter groups by clustering transcription start sites (TSSs) separated by <500 bases. A total of 8308 human genes and 4276 mouse genes were found to have putative multiple promoters. Third, DBTSS provides detailed sequence comparisons of user-specified TSSs. Finally, we have added TSS information for zebrafish, malaria and schyzon (a red algae model organism). DBTSS is accessible at

    Altered expression of testis-specific genes, piRNAs, and transposons in the silkworm ovary masculinized by a W chromosome mutation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the silkworm, <it>Bombyx mori</it>, femaleness is strongly controlled by the female-specific W chromosome. Originally, it was presumed that the W chromosome encodes female-determining gene(s), accordingly called <it>Fem</it>. However, to date, neither <it>Fem </it>nor any protein-coding gene has been identified from the W chromosome. Instead, the W chromosome is occupied with numerous transposon-related sequences. Interestingly, the silkworm W chromosome is a source of female-enriched PIWI-interacting RNAs (piRNAs). piRNAs are small RNAs of 23-30 nucleotides in length, which are required for controlling transposon activity in animal gonads. A recent study has identified a novel mutant silkworm line called KG, whose mutation in the W chromosome causes severe female masculinization. However, the molecular nature of KG line has not been well characterized yet.</p> <p>Results</p> <p>Here we molecularly characterize the KG line. Genomic PCR analyses using currently available W chromosome-specific PCR markers indicated that no large deletion existed in the KG W chromosome. Genetic analyses demonstrated that sib-crosses within the KG line suppressed masculinization. Masculinization reactivated when crossing KG females with wild type males. Importantly, the KG ovaries exhibited a significantly abnormal transcriptome. First, the KG ovaries misexpressed testis-specific genes. Second, a set of female-enriched piRNAs was downregulated in the KG ovaries. Third, several transposons were overexpressed in the KG ovaries.</p> <p>Conclusions</p> <p>Collectively, the mutation in the KG W chromosome causes broadly altered expression of testis-specific genes, piRNAs, and transposons. To our knowledge, this is the first study that describes a W chromosome mutant with such an intriguing phenotype.</p

    Genome-wide identification and annotation of HIF-1α binding sites in two cell lines using massively parallel sequencing

    Get PDF
    We identified 531 and 616 putative HIF-1α target sites by ChIP-Seq in the cancerous cell line DLD-1 and the non-cancerous cell line TIG-3, respectively. We also examined the positions and expression levels of transcriptional start sites (TSSs) in these cell lines using our TSS-Seq method. We observed that 121 and 48 genes in DLD-1 and TIG-3 cells, respectively, had HIF-1α binding sites in proximal regions of the previously reported TSSs that were up-regulated at the transcriptional level. In addition, 193 and 123 of the HIF-1α target sites, respectively, were located in proximal regions of previously uncharacterized TSSs, namely, TSSs of putative alternative promoters of protein-coding genes or promoters of putative non-protein-coding transcripts. The hypoxic response of DLD-1 cells was more significant than that of TIG-3 cells with respect to both the number of target sites and the degree of induced changes in transcript expression. The Nucleosome-Seq and ChIP-Seq analyses of histone modifications revealed that the chromatin formed an open structure in regions surrounding the HIF-1α binding sites, but this event occurred prior to the actual binding of HIF-1α. Different cellular histories may be encoded by chromatin structures and determine the activation of specific genes in response to hypoxic shock. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11568-011-9150-9) contains supplementary material, which is available to authorized users

    H-DBAS: Alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational

    Get PDF
    The Human-transcriptome DataBase for Alternative Splicing (H-DBAS) is a specialized database of alternatively spliced human transcripts. In this database, each of the alternative splicing (AS) variants corresponds to a completely sequenced and carefully annotated human full-length cDNA, one of those collected for the H-Invitational human-transcriptome annotation meeting. H-DBAS contains 38 664 representative alternative splicing variants (RASVs) in 11 744 loci, in total. The data is retrievable by various features of AS, which were annotated according to manual annotations, such as by patterns of ASs, consequently invoked alternations in the encoded amino acids and affected protein motifs, GO terms, predicted subcellular localization signals and transmembrane domains. The database also records recently identified very complex patterns of AS, in which two distinct genes seemed to be bridged, nested or degenerated (multiple CDS): in all three cases, completely unrelated proteins are encoded by a single locus. By using AS Viewer, each AS event can be analyzed in the context of full-length cDNAs, enabling the user's empirical understanding of the relation between AS event and the consequent alternations in the encoded amino acid sequences together with various kinds of affected protein motifs. H-DBAS is accessible at

    Cost-Effective Sequencing of Full-Length cDNA Clones Powered by a De Novo-Reference Hybrid Assembly

    Get PDF
    Sequencing full-length cDNA clones is important to determine gene structures including alternative splice forms, and provides valuable resources for experimental analyses to reveal the biological functions of coded proteins. However, previous approaches for sequencing cDNA clones were expensive or time-consuming, and therefore, a fast and efficient sequencing approach was demanded., to confirm that its ability was competent even for non-human species.The entire sequencing and shotgun assembly takes less than 1 week and the consumables cost only ∼US$3 per clone, demonstrating a significant advantage over previous approaches
    • …
    corecore